25 research outputs found

    Self-Edit: Fault-Aware Code Editor for Code Generation

    Full text link
    Large language models (LLMs) have demonstrated an impressive ability to generate codes on competitive programming tasks. However, with limited sample numbers, LLMs still suffer from poor accuracy. Inspired by the process of human programming, we propose a generate-and-edit approach named Self-Edit that utilizes execution results of the generated code from LLMs to improve the code quality on the competitive programming task. We execute the generated code on the example test case provided in the question and wrap execution results into a supplementary comment. Utilizing this comment as guidance, our fault-aware code editor is employed to correct errors in the generated code. We perform extensive evaluations across two competitive programming datasets with nine different LLMs. Compared to directly generating from LLMs, our approach can improve the average of pass@1 by 89\% on APPS-dev, 31\% on APPS-test, and 48\% on HumanEval over nine popular code generation LLMs with parameter sizes ranging from 110M to 175B. Compared to other post-processing methods, our method demonstrates superior accuracy and efficiency.Comment: Accepted by ACL202

    Implant Global and Local Hierarchy Information to Sequence based Code Representation Models

    Full text link
    Source code representation with deep learning techniques is an important research field. There have been many studies that learn sequential or structural information for code representation. But sequence-based models and non-sequence-models both have their limitations. Researchers attempt to incorporate structural information to sequence-based models, but they only mine part of token-level hierarchical structure information. In this paper, we analyze how the complete hierarchical structure influences the tokens in code sequences and abstract this influence as a property of code tokens called hierarchical embedding. The hierarchical embedding is further divided into statement-level global hierarchy and token-level local hierarchy. Furthermore, we propose the Hierarchy Transformer (HiT), a simple but effective sequence model to incorporate the complete hierarchical embeddings of source code into a Transformer model. We demonstrate the effectiveness of hierarchical embedding on learning code structure with an experiment on variable scope detection task. Further evaluation shows that HiT outperforms SOTA baseline models and show stable training efficiency on three source code-related tasks involving classification and generation tasks across 8 different datasets.Comment: Accepted by ICPC 202

    Exploring the metabolic network of the epidemic pathogen Burkholderia cenocepacia J2315 via genome-scale reconstruction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Burkholderia cenocepacia </it>is a threatening nosocomial epidemic pathogen in patients with cystic fibrosis (CF) or a compromised immune system. Its high level of antibiotic resistance is an increasing concern in treatments against its infection. Strain <it>B. cenocepacia </it>J2315 is the most infectious isolate from CF patients. There is a strong demand to reconstruct a genome-scale metabolic network of <it>B. cenocepacia </it>J2315 to systematically analyze its metabolic capabilities and its virulence traits, and to search for potential clinical therapy targets.</p> <p>Results</p> <p>We reconstructed the genome-scale metabolic network of <it>B. cenocepacia </it>J2315. An iterative reconstruction process led to the establishment of a robust model, <it>i</it>KF1028, which accounts for 1,028 genes, 859 internal reactions, and 834 metabolites. The model <it>i</it>KF1028 captures important metabolic capabilities of <it>B. cenocepacia </it>J2315 with a particular focus on the biosyntheses of key metabolic virulence factors to assist in understanding the mechanism of disease infection and identifying potential drug targets. The model was tested through BIOLOG assays. Based on the model, the genome annotation of <it>B. cenocepacia </it>J2315 was refined and 24 genes were properly re-annotated. Gene and enzyme essentiality were analyzed to provide further insights into the genome function and architecture. A total of 45 essential enzymes were identified as potential therapeutic targets.</p> <p>Conclusions</p> <p>As the first genome-scale metabolic network of <it>B. cenocepacia </it>J2315, <it>i</it>KF1028 allows a systematic study of the metabolic properties of <it>B. cenocepacia </it>and its key metabolic virulence factors affecting the CF community. The model can be used as a discovery tool to design novel drugs against diseases caused by this notorious pathogen.</p

    In Silico Insights into the Symbiotic Nitrogen Fixation in Sinorhizobium meliloti via Metabolic Reconstruction

    Get PDF
    BACKGROUND: Sinorhizobium meliloti is a soil bacterium, known for its capability to establish symbiotic nitrogen fixation (SNF) with leguminous plants such as alfalfa. S. meliloti 1021 is the most extensively studied strain to understand the mechanism of SNF and further to study the legume-microbe interaction. In order to provide insight into the metabolic characteristics underlying the SNF mechanism of S. meliloti 1021, there is an increasing demand to reconstruct a metabolic network for the stage of SNF in S. meliloti 1021. RESULTS: Through an iterative reconstruction process, a metabolic network during the stage of SNF in S. meliloti 1021 was presented, named as iHZ565, which accounts for 565 genes, 503 internal reactions, and 522 metabolites. Subjected to a novelly defined objective function, the in silico predicted flux distribution was highly consistent with the in vivo evidences reported previously, which proves the robustness of the model. Based on the model, refinement of genome annotation of S. meliloti 1021 was performed and 15 genes were re-annotated properly. There were 19.8% (112) of the 565 metabolic genes included in iHZ565 predicted to be essential for efficient SNF in bacteroids under the in silico microaerobic and nutrient sharing condition. CONCLUSIONS: As the first metabolic network during the stage of SNF in S. meliloti 1021, the manually curated model iHZ565 provides an overview of the major metabolic properties of the SNF bioprocess in S. meliloti 1021. The predicted SNF-required essential genes will facilitate understanding of the key functions in SNF and help identify key genes and design experiments for further validation. The model iHZ565 can be used as a knowledge-based framework for better understanding the symbiotic relationship between rhizobia and legumes, ultimately, uncovering the mechanism of nitrogen fixation in bacteroids and providing new strategies to efficiently improve biological nitrogen fixation

    Network-assisted analysis of primary Sjogren's syndrome GWAS data in Han Chinese

    No full text
    Primary Sjogren's syndrome (pSS) is a complex autoimmune disorder. So far, genetic research in pSS has lagged far behind and the underlying biological mechanism is unclear. Further exploring existing genome-wide association study (GWAS) data is urgently expected to uncover disease-related gene combination patterns. Herein, we conducted a network-based analysis by integrating pSS GWAS in Han Chinese with a protein-protein interactions network to identify pSS candidate genes. After module detection and evaluation, 8 dense modules covering 40 genes were obtained for further functional annotation. Additional 31 MHC genes with significant gene-level P-values (sigMHC-gene) were also remained. The combined module genes and sigMHC-genes, a total of 71 genes, were denoted as pSS candidate genes. Of these pSS candidates, 14 genes had been reported to be associated with any of pSS, RA, and SLE, including STAT4, GTF2I, HLA-DPB1, HLA-DRB1, PTTG1, HLA-DQB1, MBL2, TAP2, CFLAR, NFKBIE, HLA-DRA, APOM, HLA-DQA2 and NOTCH4. This is the first report of the network-assisted analysis for pSS GWAS data to explore combined gene patterns associated with pSS. Our study suggests that network-assisted analysis is a useful approach to gaining further insights into the biology of associated genes and providing important clues for future research into pSS etiology

    Network-Based Analysis of Schizophrenia Genome-Wide Association Data to Detect the Joint Functional Association Signals.

    No full text
    Schizophrenia is a common psychiatric disorder with high heritability and complex genetic architecture. Genome-wide association studies (GWAS) have identified several significant loci associated with schizophrenia. However, the explained heritability is still low. Growing evidence has shown schizophrenia is attributable to multiple genes with moderate effects. In-depth mining and integration of GWAS data is urgently expected to uncover disease-related gene combination patterns. Network-based analysis is a promising strategy to better interpret GWAS to identify disease-related network modules. We performed a network-based analysis on three independent schizophrenia GWASs by using a refined analysis framework, which included a more accurate gene P-value calculation, dynamic network module searching algorithm and detailed functional analysis for the obtained modules genes. The result generated 79 modules including 238 genes, which form a highly connected subnetwork with more statistical significance than expected by chance. The result validated several reported disease genes, such as MAD1L1, MCC, SDCCAG8, VAT1L, MAPK14, MYH9 and FXYD6, and also obtained several novel candidate genes and gene-gene interactions. Pathway enrichment analysis of the module genes suggested they were enriched in several neural and immune system related pathways/GO terms, such as neurotrophin signaling pathway, synaptosome, regulation of protein ubiquitination, and antigen processing and presentation. Further crosstalk analysis revealed these pathways/GO terms were cooperated with each other, and identified several important genes, which might play vital roles to connect these functions. Our network-based analysis of schizophrenia GWASs will facilitate the understanding of genetic mechanisms of schizophrenia

    Consistency Analysis of Large-scale Energy Storage Batteries

    No full text
    With the development of large-scale electrochemical energy storage power stations, lithium-ion batteries have unique advantages in terms of re-energy density, power density, and cycle life, and are applied to power system energy storage devices. However, behind the rapid development, there are many key issues unanswered, which are likely to lead to various safety accidents. Therefore, it is very important to conduct consistency analysis of lithium batteries used in large-scale power systems to prepare for system safety assessment. This paper mainly explains the reasons and manifestations of the inconsistency, and based on data mining algorithms, uses the charging voltage curve clustering analysis method based on subtractive clustering to evaluate the consistency of lithium-ion batteries

    ToolCoder: Teach Code Generation Models to use API search tools

    Full text link
    Automatically generating source code from natural language descriptions has been a growing field of research in recent years. However, current large-scale code generation models often encounter difficulties when selecting appropriate APIs for specific contexts. These models may generate APIs that do not meet requirements or refer to non-existent APIs in third-party libraries, especially for lesser-known or private libraries. Inspired by the process of human developers using tools to search APIs, we propose ToolCoder, a novel approach that integrates API search tools with existing models to assist in code generation and API selection. To teach our model to use tools, we introduce an automated data annotation method using ChatGPT to add tool usage information into the source code data and fine-tune code generation models. During inference, we integrate API search tools into the generation process so that our model can automatically use the search tool to get suggestions when selecting an API. Our experimental results demonstrate that ToolCoder exhibits excellent performance and generalization across five public and private library code generation benchmarks, with at least 6.21\% improvement on average pass@1 metrics and 9.64\% improvement on average pass@10 metrics compared to state-of-the-art methods. Furthermore, we show that our relatively small ToolCoder model is comparable to one of the current best models, GPT-3.5, highlighting the potential of incorporating programming tools into the code generation process

    Protein-protein interaction network involving all merged module genes.

    No full text
    <p>Square nodes denote the reported genes associated with schizophrenia or bipolar disorder. The color of the node was proportioned with the <i>P</i>-value of gene. The width of the edge was proportioned with the No. of repeats of the edge in the modules. The purple edges, green edges and blue edges were interactions from MGS, Affy6 and Affy500K respectively.</p
    corecore